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ABSTRACT. In this article a unified approach to iterative soft-thresholding 
algorithms for the solution of linear operator equations in infinite dimensional 
Hilbert spaces is presented. We formulate the algorithm in the framework 
of generalized gradient methods and present a new convergence analysis. As 
main result we show that the algorithm converges with linear rate as soon as the 
underlying operator satisfies the so-called finite basis injectivity property or the 
minimizer possesses a so-called strict sparsity pattern. Moreover it is shown 
that the constants can be calculated explicitly in special cases (i.e. for compact 
operators). Furthermore, the techniques also can be used to establish linear 
convergence for related methods such as the iterative thresholding algorithm 
for joint sparsity and the accelerated gradient projection method. 



This paper is concerned with the convergence analysis of numerical algo- 
rithms for the solution of linear inverse problems in the infinite-dimensional 
setting with so-called sparsity constraints. The background for this type of 
problem is, for example, the attempt to solve the linear operator equation 
Ku = f in an infinite-dimensional Hilbert space which models the connec- 
tion between some quantity of interest u and some measurements /. Often, 
the measurements / contain noise which makes the direct inversion ill-posed 
and practically impossible. Thus, instead of considering the linear equation, 
a regularized problem is posed for which the solution is stable with respect 
to noise. A common approach is to regularize by minimizing a Tikhonov 
functional [7, 15,28]. A special class of these regularizations has been of 
recent interest, namely of the type 
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These problems model the fact that the quantity of interest u is composed of 
a few elements, i.e. it is sparse in some given, countable basis. To make this 
precise, let A : 7i\ — * TC2 be a bounded operator between two Hilbert spaces 
and let {V>fc} be an orthonormal basis of 7i±. Denote with B : C 2 — ► TL\ the 
synthesis operator B{uk) = J2k u k' l Pk- Then the problem 



can be rephrased as (jl.ip with K = AB. Indeed, solutions of this type of 
problem admit only finitely many non-zero coefficients and often coincide 
with the sparsest solution possible [10,18,20]. 

Unfortunately, the numerical solution of the above (non-smooth) mini- 
mization problem is not straightforward. There is a vast amount of literature 
dealing with efficient computational algorithms for equivalent formulations 
of the problem [8, 12, 14, 16, 21, 22, 27, 33], both in the infinite-dimensional 
setting as well as for finitely many dimensions, but mostly for the finite- 
dimensional case. An often-used, simple but apparently slow algorithm is 
the iterative soft-thresholding (or thresholded Landweber) procedure which 
is known to converge in the strong sense in infinite dimensions [7]. The 
algorithm is simple: it just needs an initial value u° and an operator with 
1 1 IT 1 1 < 1. The iteration reads as 



u n+1 = S a (u n -K*(Ku n -f)) , (S a (w)) k = S gn(w k )[\w k \-a k ] 



In practice it is important to know moreover convergence rates for the al- 
gorithms or at least an estimate for the distance to a minimizer to eval- 
uate the fidelity of the outcome of the computations. The convergence 
proofs in the infinite-dimensional case presented in [7], and for generaliza- 
tions in [5], however, do not imply a-priori estimates and do not inherently 
give any rate of convergence, although, in many cases, linear convergence 
can be deduced quite easily from the fact that iterative thresholding con- 
verges strongly and from the special structure of the algorithm. To the best 
knowledge of the authors, [3] contains the first results about the convergence 
of iterative algorithms for linear inverse problems with sparsity constraints 
in infinite dimensions for which the convergence rate is inherent in the re- 
spective proof. There, an iterative hard-thresholding procedure has been 
proposed for which, if K is injective, a convergence rate of 0(n~ 1 / 2 ) could 
be established. 

The main purpose of this paper is to develop a general and unified 
framework for the convergence analysis of algorithms for the problem (11.1ft 
and related problems, especially for the iterative soft-thresholding algorithm. 
We show that the iterative soft-thresholding algorithm converges linearly in 
almost every case and point out how to obtain a-priori estimates. To this 
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end, we formulate the iterative soft-thresholding as a generalized gradient 
projection method which leads to a new proof for the strong convergence 
which is independent of the proof given in [7]. The techniques used for 
our approach may shed new light on the known properties of the iterative 
soft-thresholding related methods. 

We distinguish two key properties which lead to linear convergence. 
The first is called finite basis injectivity (FBI) and is a property of the 
operator K only while the second is called a strict sparsity pattern of a 
solution of the minimization problem (jl.ip . 

Definition 1. An operator K : I 2 — ► TL2 mapping into a Hilbert space has 
the finite basis injectivity property, if for all finite subsets I C N the operator 
K\j is injective, i.e. for all u,v £ £ 2 with Ku = Kv and Uk = f fe = for all 
k ^ I it follows u = v. 

Definition 2. A solution u* of (II. ip possesses a strict sparsity pattern if 
whenever = for some k there follows \K*(Ku* — f)\ k < a^. 

The main result can be summarized by the following: 

Theorem 1. Let K : I 2 — > TL2, K 7^ be a linear and continuous operator 
as well as f G Ti.2- Consider the sequence {u n } given by the iterative soft- 
thresholding procedure 

u n+1 = S Sna {u n - s n K*(Ku n - /)) , {S Sna (w)) k = sgn(> fe ) [K| - s n a k ] + 

(1.2) 

with step size 

< s < s n < s < 2/\\K\\ 2 (1.3) 

and a u° E I 2 such that S£=i a fcl u fcl < 00 • t/iere is a minimizer u* 

such that u n — > u* m ^ 2 . 

Moreover, suppose that either 

1. X possesses the FBI property, or 

2. -u* possesses a strict sparsity pattern. 

Then, u n — > u* wft a linear rate, i.e. there exists a C > anc? a < A < 1 
suc/i t/iai ||n n < CA n . 

Remark 1 (Examples for operators with the FBI property). In 

the context of inverse problems with sparsity constraints, the FBI property 
is natural, since the operators A are often injective. Prominent examples 
are the Radon transform [25], solution operators for partial differential equa- 
tions, e.g. in heat conduction problems [6] or inverse boundary value prob- 
lems like electrical impedance tomography [26]. The combination with a 
synthesis operator B for an orthonormal basis does not influence the injec- 
tivity. 
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Moreover, the restriction to orthonormal bases can be relaxed. The 
results presented in this paper also hold if the system {tpk} is a frame or even 
a dictionary — as long as the FBI property is fulfilled. This is for example the 
case for a frame which consists of two orthonormal bases where no element 
of one basis can be written as a finite linear combination of elements of the 
other. This is typically the case, e.g. for a trigonometric basis and the Haar 
wavelet basis on a compact interval. One could speak of FBI frames or FBI 
dictionaries. 

Remark 2 (Strict sparsity pattern). This condition can be inter- 
preted as follows. We know that the weighted £ 1 -regularization imposes 
sparsity on a solution u* in the sense that u* k = for all but finitely many 
k, hence the name sparsity constraint. For the remaining indices, the equa- 
tions (K*Ku*)k = K*f — afcSgn(w£) are satisfied which corresponds to an 
approximate solution of the generally ill-posed equation Ku = f in a certain 
way. Now the condition that the solutions of (jl.ip possess a strict sparsity 
pattern says that ut = for some index k can occur only because of the 
sparsity constraint but never for the solution of the linear equation. We 
emphasize that Theorem [T] states that whenever {u n } converges to a solu- 
tion u* with strict sparsity pattern, then the speed of convergence has to be 
linear for all bounded linear operators K. 

The proof of Theorem [T] will be divided into three sections. First, in 
Section [21 we introduce a framework in which iterative soft-thresholding 
according to (|1.2|) can be interpreted as a generalized gradient projection 
method. We derive descent properties for generalized gradient methods and 
show under which conditions we can obtain linear convergence in Section [3j 
We show in Section H] that a Bregman-distance estimate for problems of the 
type (jl.ip gives a new convergence proof for the iterative soft-thresholding. 
In Section [5] we illustrate the broad range of applicability of the results with 
two more examples. Finally, some conclusions about the implications of the 
results are drawn in Section El 

2. Iterative soft-thresholding and a generalized 
gradient projection method 

A common approach to solve smooth unconstrained minimization problems 
are methods based on moving in the direction of steepest descent, i.e. the 
negative gradient. In constrained optimization, the gradient is often pro- 
jected back to the feasible set, yielding the well-known gradient projection 
algorithm method [11, 19,23]. In the following, a step of generalization is 
introduced: The method is extended to deal with sums of smooth and non- 
smooth functionals, and covers in particular constrained smooth minimiza- 
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tion problems. The gain is that the iteration (jl.2p fits into this generalized 
framework. 

Similar to the generalization performed in [4], its main idea is to replace 
the constraint by a general proper, convex and lower semi-continuous func- 
tional $ which leads, for the gradient projection method, to the successive 
application of the associated proximity operators, i.e. 

\\v — w\\ 2 

J s : w i— ► argmin h s$>{v) . (2-1) 

ven 2 

The generalized gradient projection method for minimization problems of 
type 

min F(u) + <£>(u) (2.2) 

ueH 

then read as follows. 
Algorithm 1. 

1. Choose a u° G TL with &(u°) < oo and set n = 0. 

2. Compute the next iterate u n+1 according to 

u n+l = J Sn {u n -s n F'(u n )) . 

where s n satisfies an appropriate step-size rule and J s from \2.1\) . 

3. Set n := n + 1 and continue with Step 2. 

Note that the solutions of the minimization problem are exactly the fixed 
points of the algorithm. Moreover, the case $ = In, where f2 is a closed 
and convex constraint, yields the classical gradient projection method which 
is known to converge provided that certain assumptions are fulfilled and a 
suitable step-size rule has been chosen [9,11]. 

In the following, we assume that F is differentiable, F' is Lipschitz 
continuous with constant L and usually choose the step-sizes such that 

0<s<s„<s< 2/L. (2.3) 

Note that form the trivial case L = we agree that 2/L = oo. 

Remark 3 (Forward-backward splitting). The generalization of the 
gradient projection method leads to a special case of the so-called proximal 
forward-backward splitting method which amounts to the iteration 

u n+l =u n + tn ^j sn ( u _ Sn(jF '( u n) + + a « _ ^ 

where t n £ [0,1] and {a n },{b n } are absolutely summable sequences in TC. 
In [5] , it is shown that this method converges strongly to a minimizer under 
appropriate conditions. There exist, however, no general statements about 
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convergence rates so far. Here, we restrict ourselves to the special case of 
the generalized gradient projection method. 

Finally, it is easy to see that the iterative soft-thresholding algorithm 
(|1.2p is a special case of this generalized gradient projection method in case 
the functionals F : £ 2 — » R and : I 2 — ► ]— oo, c>o] are chosen according to 

\\Ku-f\\ 2 J j if the sum converges 

= o ' *W = i , 

^ I oo , else 

(2-4) 

where K : ^ 2 — > W2 is linear and continuous between the Hilbert spaces ^ 2 
and / £ an d is sequence satisfying > a > for all fc. 

Here, F'(u) = K*(Ku — /), so in each iteration step of Algorithm [T] we 
have to solve 

\\u n - s n K*(Ku n - f)-v\\ 2 ^ . 
mm h s n } a k \v k \ 

fe=i 

for which the solution is given by soft-thresholding, i.e. 

W = S SnQ (n"- Sn K*(K U n -/)) , 

with S SnQ , according to (|1.2p . see [7], for example. 

Since the Lipschitz constant associated with F' does not exceed ||-?^|| 2 , 
this result can be summarized as follows: 

Proposition 1. Let K : I 2 — > TL2 be a bounded linear operator, f € 
Ti.2 and < a < at- Let F and <!> be chosen according to (12.4ft , Then 
Algorithmic with step-size {s n } according to (jl.3p coincides with the iterative 
soft-thresholding procedure (II. 2p . 



Here and in the following, we also agree to set 2/||i^|| 2 = 00 in (11.3 
for the trivial case K = 0. 



3. Convergence of the generalized gradient 
projection method 

In the following, conditions which ensure convergence of the generalized 
gradient projection method are derived. The key is the descent of the func- 
tional F + in each iteration step. The following lemma states some basic 
properties of one iteration. 

Lemma 1. Let F be differentiable with F' Lipschitz continuous with asso- 
ciated constant L and $ be proper, convex and lower semi- continuous. Set 
v = J s (u — sF'(u)} as in h2.1\) for some s > and denote by 

D s (u) = *(«) - *(v) + (F'(u), u-v) (3.1) 
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Then it holds: 



Viu G 7i : $(w) - + <F», «;-«)> ^ V,W V \ (3.2) 



IV — U" 



D s (u) > 2 — (3.3) 

s 

(F + *)( V ) < (F + *)(«) - (l - ^)f>». (3.4) 

Proof. Since v solves the problem 

. \\v - u + sF'(u)\\ 2 
mm - + s$(v) 

it immediately follows that the subdifferential inclusion u — sF'(u) — v S 
sd&(v) is satisfied, see [13,29] for an introduction to convex analysis and 
subdifferential calculus. This can be rewritten to 

(u - sF'(u) -v,w-v) < s($(w) - ®(v)) for all w £ H , 



while rearranging and dividing by s proves the inequality (13.2|) . The in- 
equality (|3.3p follows by setting w = u in (|3.2p . 
To show inequality (13. 4p . we observe 



(F + *)(«) - (F + $)(«) + £> s (w) = F(u) - F(u) + (F'(u), u-v) 

-I 

(F'(t/ + i(v - it)) — F' (u), u — u) dt . 
Using the Cauchy-Schwarz inequality and the Lipschitz continuity we obtain 

(F + $)(«) - (F + *)(«) + F> s (u) < / tL||^-n|| 2 dt = f ||u-u|| 2 . 

Jo 

Finally, applying the estimate (j3.3l) leads to (|3.4H . □ 

Remark 4 (A weaker step-size condition). If the step-size in the 
generalized gradient projection method is chosen such that s n < s < 2/L, 
then we can conclude from (13.41) that 



(F + ^)(u n+1 ) < (F + $)(u n ) — 5D Sn (u n ) (3.5) 

where 6 = 1-%. Of course, the constraint on the step size is only sufficient 
to guarantee such a decrease. A weaker condition is the following: 

[\F'(u n + t(u n+1 - u n )) - F'(u n ), u n+1 - u n ) dt < (1 - 6)D Sn (u n ) 
Jo 

(3.6) 
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for some 5 > 0. Regarding the proof of Lemma [H it is easy to see that this 
condition also leads to the estimate (|3.5p . Unfortunately, (I3.6P can only be 
verified a-posteriori, i.e. with the knowledge of the next iterate u n+1 . So 
one has to guess an s n and check if (|3.6|) is satisfied, otherwise a different s n 
has to be chosen. In practice, this means that one iteration step is lost and 
consequently more computation time is needed, reducing the advantages of 
a more flexible step size. 

While the descent property (|3.5p can be proven without convexity as- 
sumptions on F, we need such a property to estimate the distance of the 
functional values to the global minimum of F + <I> in the following. We 
introduce for any sequence {u n } C TL according to Algorithm [T] the values 

r n = (F + <$>)(u n ) - (min (F + . (3.7) 

Proposition 2. Let F be convex and continuously differentiable with Lip- 
schitz continuous derivative. Let {u n } be a sequence generated by Algo- 
rithm^ such that the step-sizes are bounded from below, i.e. s n > s > 0, 
and that we have 

(F + <S>)(u n+1 ) <(F + - SD Sn (u n ) 

for a 5 > with D Sn {u n ) according to (|3.ip , 

1. If F + is coercive, then the values r n according to (|3.7p satisfy 
r n — > with rate C(n _1 ), i.e. there exists a C > such that 

r n < Cn^ 1 . 

2. If for a minimizer u* and some c > the values r n from (|3.7p satisfy 

\\u n -u*\\ 2 <cr n , (3.8) 

then {r n } vanishes exponentially and {u n } converges linearly to u* , 
i.e. there exists a C > and a X £ [0, 1[ such that 

\\u n -u*\\ < C\ n . 

Proof. We first prove an estimate for r n and then treat the cases sep- 
arately. For this purpose, pick an optimal u* £ TL and observe that the 
decrease in each iteration step can be estimated by 

r n - r n+1 = (F + $)(u") - (F + $)(u n+1 ) > 5D Sn {u n ) , 

according to the assumptions. Note that D Sn {u n ) > by (|3.3p . so {r n } is 
non-increasing. 

Use the convexity of F to deduce 

r n < $(u") - $(«*) + (F'(u n ), u n - u*) 
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D Sn (u n ) + {F'(u n ), u n+l - u*) + $(u n+1 ) - 
(tt n -it n+1 , u n+1 - u*) 



<^(« n ) + 



|u n+1 - W 



< D Sn (u n ) + » — ^^v^j^y 

by applying the Cauchy-Schwarz inequality as well as (|3.2|) and (|3.3[) . With 
the above estimate on r n — r n+ \ and < s < s n we get 

V5\\u n+1 -u 



Sr n < (r n - r n+ i) H V r n = r n+i • (3.9) 

We now turn to prove the first statement of the proposition. Assume 
that F + is coercive, so from the fact that {r n } is non-increasing follows 
that \\u n — it* || has to be bounded by a C\ > 0. Furthermore, < r n — 
r n +i < r < oo, implying 



Sr n < ( v / ^o + vfyi Ci)v^n - ^n+l 

and consequently 



<F 2 < r n - r n+ i , g = ( — — " — ) > 



5 \2 



Standard arguments then give the rate r n = 0(n _1 ), we repeat them here 
for convenience. The above estimate on r n — r n+ \ as well the property that 
{r„} is non-increasing yields 



J J_ = r n ~ r n +i > rl 



'jt+l Tn T n T n -\-\ T n T n J r \ 

which, summed up, leads to 

11^1 1 



1 1 -1 -1 

2 J > nq r n > nq + r 



r n r r i+ i r 

and consequently, since q > 0, to the desired rate r n < {nq+r^ 1 )^ 1 < Cn~ x . 

Regarding the second statement, assume that there is a c > such that 
||u n — u*|| 2 < cr n for some optimal u* and each n. Starting again at (13. 9p 
and applying Young's inequality yields, for each e > 0, 

x / , x , 5e\\u n+1 -u*\\ 2 r n -r n+1 
or n < (r n - r n+ i) H — 1 — . 

Choosing e = sc _1 and exploiting the assumption ||ii n+1 — u*|| 2 < cr n+ i as 
well as the fact r n+ \ < r n then imply 

x , , x . 6 r n - r n+ i fee" 1 
or n < (r n - r n+1 ) + -r n -\ r n - r n+i > — ^ — - r n 
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which in turn establishes the exponential decay rate 

r n+1 < (l - 2^1) r» < AV n , wrth A = (l - e [0, 1[. 

(3.10) 

Using \\u n — u*\\ 2 < cr n again finishes the proof: 



n -u*\\ < (cr n ) 1/2 < (cr ) 1/2 A" 



u 

□ 

Proposition [2] tells us that we only have to establish (13. 8p to obtain 
strong convergence with linear convergence rate. This can be done with 
determining how fast the functionals F and $ vanish at some minimizer. 
This can be made precise by introducing the following notions which also 
turn out to be the essential ingredients to show (|3.8p : First, define for a 
minimizer u* G H the functional 

R(v) = (F'(u*), v - u*) + *(«) - $(«*) . (3.11) 

Note that if the subgradient of in u* is unique, R is the Bregman distance 
of $ in u*, a notion which is extensively used in the analysis of descent 
algorithms [2,30]. Moreover, we make use of the remainder of the Taylor 
expansion of F, 

T(v) = F(v) - F(u*) - (F'(u*), v-u*) . (3.12) 

Remark 5 (On the Bregman distance). In many cases the Bregman- 
like distance R is enough to estimate the descent properties, see [3,30]. For 
example, in case that $ is the p-th power of a norm of a 2-convex Banach 
space X, i.e. = with p £ ]1,2], which is moreover continuously 

embedded in 7i, one can show that 

,*||2 



\\v-u*\\ 2 x < CiR(v) 

holds on each bounded set of X, see [34]. Consequently, with j p = 
denoting the duality mapping with gauge 1 1— > 



,* ||2 s~ f~i IL. I|2 
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X 



< CiC 2 (H* - ||«*||^. -p(jp(«*), u - u*» = cR(v) 

observing that R is in this case the Bregman distance. Often, Tikhonov 
functionals for inverse problems admit such a structure, e.g. 



.Ku-f\\' ^ 
mm h > \Ub> 



fc=l 
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FIGURE 1: Illustration of the Bregman-like distance R and the Taylor 
distance T for a convex $ and a smooth F. Note that for the optimal value 
u* it holds -F'(u*) e d$(u*). 

a regularization which is also topic in [7] . As one can see in complete analogy 
to Proposition [H the generalized gradient projection method also amounts to 
the iteration proposed there, so as a by-product and after verifying that the 
prerequisites of Proposition [2] indeed hold, one immediately gets a linearly 
convergent method. 

However, in the case that $ is not sufficiently convex, the Bregman 
distance alone is not sufficient to obtain the required estimate on the r n . 
This is the case for F and $ according (|2.4H . In this situation we also have 
to take the "Taylor distance" T into account. Figure [T] shows an illustration 
of the values R and T. One could say that the Bregman distance measures 
the error corresponding to the $ part while the Taylor distance does the 
same for the F part. 

The functionals R and T possess the following properties: 

Lemma 2. Consider the problem (]2.2p where F is convex, differentiable 
and is proper, convex and lower semi- continuous. If u* 6 TC is a solution 
of (12. 2D and v G TL is arbitrary, then the functionals R and T according to 
(|3.1ip and (|3.12p . respectively, are non-negative and satisfy 

R(v) + T(v) = {F + $)(«) — (F + . 

Proof. The identity is obvious from the definition of R and T. For the 
non-negativity of R, note that since u* is a solution, it holds that —F'(u*) G 
d&(u*). Hence, the subgradient inequality reads as 

(F'(u*), v -u*)< => R(v)>0. 
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Likewise, the property T(v) > is a consequence of the convexity of F. □ 

Now it follows immediately that R(v) = T(v) = whenever v is a min- 
imizer. To conclude this section, the main statement about the convergence 
of the generalized gradient projection method reads as: 

Theorem 2. Let F be a convex, differentiable functional with Lipschitz- 
continuous derivative (with associated Lipschitz constant L), <3? be proper, 
convex and lower semi- continuous and {u n } be a sequence generated by Al- 
gorithm [7] with step-size according to (|2.3|) . Moreover, suppose that u* G H 
is a solution of the minimization problem (|2.2[) . 

If, for each M S E there exists a constant c(M) > such that 

\\v -u* || 2 < c(M){R(v) +T{v)) (3.13) 

for each v satisfying (F + &)(v) < M and R(v) and T{v) defined by \3.11\) 
and \3.12\) , respectively, then {u n } converges linearly to the unique mini- 
mizer u* . 

Proof. A step-size chosen according to (12. 3D implies, by Lemma HJ the 
descent property (|3.5|) with 5 = 1 — sL/2. In particular, from (|3.5|) follows 
that {r n } is non-increasing (also remember (|3.3p means in particular that 
D Sn (u n ) > 0). Now choose M = (F + $>)(u°) < oo for which, by assumption, 
a c > exists such that 

\\u n - u*f < c(R{u n ) +T{u n )) =cr n . 

Hence, the prerequisites for Proposition [2] are fulfilled and consequently, 
u n — ► u* with a linear rate. Finally, the minimizer has to be unique: If u** 
is also a minimizer, then u** plugged into f)3. 13j) gives \\u** — u*\\ 2 = and 
consequently u* = u** . □ 



4. Convergence rates for the iterative 
soft-thresholding method 

We now turn to the proof of the main result, Theorem dj which collects the 
results of Sections [2] and El Within this section, we consider the regularized 
inverse problem (jl.ip under the prerequisites of Proposition [TJ It is known 
that at least one minimizer for (jl.l|) exists [7]. 

We have already seen in Proposition [1] that the iterative thresholding 
procedure (|1.2p is equivalent to a generalized gradient projection method. 
Our aim is, on the one hand, to apply Proposition [2] in order to get strong 
convergence from the descent rate 0(n _1 ). On the other hand, we will show 
the applicability of Theorem [2] for K possessing the FBI property which 
implies the desired convergence speed. Observe that F and $ meet the 
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requirements of Theorem [2] and that the step-size rule (jl.3p immediately 
implies (I2.3p . so we only have to verify (|3.13j) . This will be done, among a 
Bregman-distance estimate, in the following lemma, which is also serving as 
the crucial prerequisite for showing convergence. 

Lemma 3. For each minimizer u* of fll.lj) and each M £ R, there exists 
a c\(M,u*) and a subspace U C t 2 with finite- dimensional complement such 
that for the Bregman-like distance (|3.11|) it holds that 

R{v) >ci(M,u*)||iV(t;-u*)|| 2 (4.1) 

whenever (F + &)(v) < M with F and defined by \2.1$ . 

If K moreover satisfies the FBI property, there is a c 2 {M, u*,K) > 
such that, whenever (F + &)(v) < M, the associated Bregman- Taylor 
distance according to (|3.1ip and (|3.12p satisfies 

R{v)+T(v) > c 2 {M, u*, K)\\v -u*\\ 2 . (4.2) 

Proof. Let u* be a minimizer of (II. ip and assume that v G £ 2 satisfies 
< M for a M > 0. Then, 

oo oo oo 

R(v) = ^2a k \v k \ -^2a k \ul\ + ^2w* k (v k -u* k ) (4.3) 

k=l k=l k=l 

where w* = —F'(u*) = —K*(Ku* — f). Now since w* G d&(u*) we have 
w k G a k sgn(n^) for each k (note that d\ ■ \ = sgn( • ) with sgn(0) = [—1, 1]), 
meaning that 

a k (\v k \ - \u* k \) + w* k (v k -u* k )>0 

for each k. Denote by / = {k > 1 : \wt\ = ct k } which has to be finite since 
w* G I 2 implies 

oo > \ w k\ 2 = a k — J ^2— 2 = i^i— 2 ■ 

fee/ fee/ fee/ 

Moreover, w* k — > as A; — ► oo, so there has to be a /9 < 1 such that \w* k \/a k < 
p for each k G N\J. Also, if k G N\7, then < pa k which means in 
particular that u* k = since the opposite contradicts w* k G a k sgn(u k ). So, 
one can estimate f|4. 3|) : 

■ft(v) > y^afeKI + w^fc > ^afc(l - p)\v k \ 

k£l k0 

> (1 - P)a ^ - 4| > (1 - \ v k ~ u l\ 2 ) 1 

k$i Hi 

using the fact that one can estimate the ^-sequence norm with the i 1 - 
sequence norm, see [3] for example. With U = {v G £ 2 : v k = for k G I}, 
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the above also reads as R(v) > (1 — p)cx\\Pu(v — with Py being the 
orthogonal projection onto U in I 2 . 

Next, observe that a||-Pc/t>|| < < §(v) < M + 1, hence we have 

ll-Pl/^ir 1 > o/(M + 1). Consequently, 

R(v) > (1 M ff \\Mv - u*)\\ 2 = ci(M, u*)\\Pu(v - u*)|| 2 

which corresponds to the estimate (14, ip . Finally, <J>(u) < M whenever (F + 
< M and there is no v such that (-F + 3>)(w) < 0. Hence, for each 
M S M there is a constant for which (|4.ip holds whenever (i 7 + $)(u) < M 
which is the desired statement for R. 

To prove (|4.2jl . suppose ii" possesses the FBI property. Recall that T(u) 
can be expressed by 

\\K(v - u*)\\ 2 

T(v) = F(v) - F(u*) - (F'{u*), v-u*) = " 1 J " . (4.4) 

The claim now is that there is a C(M, u* , X) such that 

INI 2 < C(Af,M*,^)(ci(M,u*)||P[/n|| 2 + \\\Ku\\ 2 ) 

for each « G £ 2 - We will derive this constant directly. First split u = 
Ptju + Pjja_u, so we can estimate, with the help of the inequalities of Cauchy- 
Schwarz and Young (ab < a 2 / '4 + b 2 for a, b > 0), 

\\Kuf \\KP u± u\\ 2 _ H^uf 



2 u ' " ' 2 

> H^V^f II^P^H 2 > WKPy^uf \\Kf 

^ 5 o ^ 3 ll-fuull 



Since ivT fulfills the FBI property, the operator restricted to U 1 - is injective 
on the finite-dimensional space U ± , so there exists a c(U,K) > such that 
c(U, K)\\Prj±u\\ 2 < H^P^xull 2 for all u G t 2 . Hence, 

HP^ull 2 < ±c(JJ,Ky x (±\Kf\P v uf + \\\Ku\\ 2 ) 

and consequently 

INI 2 = \\Pu±uf + \\P UU f 



2\ 



< Ac{U,K)- l {{\\\K\\ 2 + \c(U,K))\\P uU \\ 2 + \\\Ku\ 
^ 2\\K\\ 2 + c(U,K) + A Cl (M,u*) , 2 ! ~ 

giving a constant c(M, u*,K) > since U depends on u*. 
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This finally yields the statement 

\\v - u*\\ 2 < c(M, u*, K) (ci(M, u*)\\Pu{v - u*)\\ 2 + \\\K(v - u*)\\ 2 ) 
< c(M,u*,K)(R(v) + T(v)) , 

consequently, ((Ml) holds for c 2 (M, it* , if ) = c(M, u* , iTp 1 • □ 

In the following, we will see that the estimate (14. lh considered in R(u n ) 
already leads to strong convergence of the iterative soft-thresholding proce- 
dure. Nevertheless, we utilize (I4.2D later, when the linear convergence result 
will be proven. 

Lemma 4. Let K : I 2 — > Hi be a linear and continuous operator as well as 
f G Ti.2- Consider the sequence {u n } which is generated by the iterative soft- 
thresholding procedure (|1.2[) with step-sizes {s n } according to (|1.3|) , Then, 
{u 11 } converges to a minimizer in the strong sense. 

Proof. Since the Lipschitz constant for F' satisfies L < \\K \\ 2 , the step 
sizes are fulfilling (|2.3[) which implies, by Lemma [U the descent property 
(|3.5p with 5 = 1 — s||if|| 2 /2. This means in particular that the associated 
functional distances {r n } are non- increasing (since (j3.3[) in particular gives 
that D Sn {u n ) > 0). Moreover, the descent result in Proposition [2] yields 
that the iterates {u n } satisfy R{u n ) < r n < ©(n^ 1 ). Since (F + <S>)(u n ) < 
(F + <£)(ii°) = M, we can apply Lemma [3] and the estimate (14. If) leads to 
strong convergence of {Prju n }, i.e. P\ju n — ► P\ju* . 

Next, consider the complement parts {Pu±u n } in the finite-dimensional 
space JJ- 1 . Since ||P(7±n n || < ||u n || < a~ 1 Q(u n ) < ro, the sequence {Prj±u n } 
is contained in a relative compact set in U^, hence there is a (strong) ac- 
cumulation point u** G If- 1 . Together with Puu n — ► Pjjit* we can conclude 
that there is a subsequence satisfying u ni — > Prju* + ^** = u*** . Moreover, 
{u n } is a minimizing sequence, so u*** has to be a minimizer. 

Finally, the whole sequence has to converge to u***: The mappings 
T n (u) = J Sn (u - s n F'{u)) satisfy 

\\T n (u) - T n (v)\\ < ||(7 - s n K*K)(u - v)\\ < \\u - v\\ 

for all u, v G £ 2 , since all proximal mappings J Sn are non-expansive and s n < 
H^ip . So if, for an arbitrary e > there exists a n such that ||u n — u*** \\ < e, 
then 

|| u n+i = ||T n (n n ) -T„(u***)|| < \\u n -u***\\ < e 

since u*** is minimizer and hence a fixed point of each T n (see Section [2]). 
By induction, u n — > u*** strongly in I 2 . □ 

With the notions of FBI property and strict sparsity pattern from Def- 
initions [T] resp. El one is able to show linear convergence as soon as one of 
this two situations is given. 
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Proof of Theorem 21 Observe that the prerequisites of Lemma U] are 
fulfilled, so there exists a minimizer u* such that u n — > u* in £ 2 . Thus, we 
have to show that each of the two cases stated in the theorem leads to a 
linear convergence rate. 

Consider the first case, i.e. K possesses the FBI property. We utilize 
that, by Lemma El the Bregman- Taylor distance according to (13. lip and 
(I3.12|) can be estimated such that (13.130 is satisfied for some c > 0. This 
implies, by Theorem [21 the linear convergence rate. 

For the proof for the second case, we refer to Appendix S □ 

Corollary 1. In particular, choosing the step -size constant, i.e. s n — s 
with s E ]0, 2||i^||~ 2 [ also leads to linear convergence under the prerequisites 
of TheoremUl for example the step-size s n = 1 works for \\K\\ < y/2. 

Remark 6 (Descent and Bregman-Taylor implies linear rate). With 
Theorem [21 the linear convergence follows directly from the estimate of the 
Bregman-Taylor distance 

\\v -u*\\ 2 < c(M,u*,K)(R(v) +T(v)) whenever (F + <f>)(v)<M 

which can be established if K satisfies the FBI property. Since the proof of 
Theorem [2] relies essentially on Proposition [21 one can easily convince oneself 
that the applicability of this proposition is sufficient for linear convergence, 
which is already the case if (|3.5p and < s < s n is satisfied. 



Remark 7 (The weak step-size condition as accelerated method). 

As already mentioned in Remark [H the condition on the step-size can be 
relaxed. In the particular setting that F(u) = ^\\Ku — f\\ 2 , the estimate 
(13. 6p reads as 

f 1 (K*K(u n +t(u n+1 - u n )) - K*Ku n , u n+1 - u n ) dt 
Jo 

\\K(u n+1 - u n )\\ 2 
= " ( 2 j " <(l-5)D Sn (u n ) 

Now, the choice s n according to 

s n \\K{u n+1 - u n )\\ 2 < 2(1 - 5)\\u n+1 - u n \\ 2 , (4.5) 

is sufficient for the above, since one has the estimate (|3.3p . Together with 
the boundedness < s < s n , this is exactly the step-size 'Condition (B)' 
in [8]. 

Hence, as can be easily seen, the choice gives sufficient descent in order 
to apply Proposition [21 Consequently, linear convergence remains valid for 
such an 'accelerated' iterative soft-thresholding procedure if K possesses the 
FBI property, see Remark [6l 
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Remark 8 (Relaxation of the FBI property). It is possible to relax 
the FBI property. Suppose that K fulfills the FBI property of order S = \I\ 
(with the set / defined in the proof of Lemma|3|, i.e., that K\j is injective 
for every finite subset I C N of size less or equal to S. This immediately 
yields the existence of c(U,K) > such that c(U, K)\\u\\ 2 < \\Ku\\ 2 for each 
u € U where U 1 - is the finite-coefficient subspace as defined in the proof of 
Lemma EJ One can easily check that the remaining arguments also remain 
true and consequently, Theorem [1] still holds. 

The constants in the estimates of Lemma [3] and Theorem [1] are in 
general not computable unless the solution is determined. Nonetheless there 
are some situations in which prior knowledge about the operator K can be 
used to estimate the decay rate. 

Theorem 3. Let K : £ 2 — > TL2, K 7^ be a compact, linear operator 
fulfilling the FBI property and define 



Furthermore, choose ho such that Hk Q < ^ 2 /(4||/|| 2 ) (with 00 allowed on 
the right-hand side). Let {u n } be a sequence generated by the iterative soft- 
thresholding algorithm with initial value u° = and constant step-size 
s = \\K\\~ 2 for the minimization of and let u* denote a minimizer. 
Then it holds that \\u n - it* II < C\ n with 



In this section, we show how linear convergence can be obtained for some 
related methods. In particular, iterative thresholding methods for mini- 
mization problems with joint sparsity constraints as well as an accelerated 
gradient projection method are considered. Both algorithms can be writ- 
ten as a generalized gradient projection method, hence the analysis carried 
out in Sections [2] and [3] can be applied, demonstrating the broad range of 
applications. 




A = max 




for some C > 0. 

The proof is given in Appendix [Bl 



5. Convergence of related methods 
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5.1 Joint sparsity constraints 

First, we consider the situation of so-called joint sparsity for vector- valued 
problems, see [1,17,32]. The problems considered are set in the Hilbert space 
(£ 2 ) N for some N > 1 which is interpreted such that for u 6 (£ 2 ) N the fc-th 
component u k is a vector in M. N . Given a linear and continuous operator 
K : (£ 2 ) N — > H2, some data / G H2, a norm | • | of and a sequence 
a fc > ^ > 0, the typical inverse problem with joint sparsity constraints 
reads as 

] \Ku-ff 



mm 



+ ^2a k \u k \ . (5.1) 



=1 



In many applications, | • | = || • || g for some 1 < q < 00. 

To apply the generalized gradient projection method for (15. ID . we split 
the functional into 



_.. N \\Ku - /II 2 , , , ^ 
F(u) = - . Hu) = }2a k 



Uk\ 



k=l 



Analogously to Proposition [H one needs to know the associated proximal 
mappings J s which can be reduced to the computation of the proximal 
mappings for d\ • I on R N . These are known to be 

(i +S d\-\r i (x) = (i-p {l . L < s} )(x) 

where Pn . | < s } denotes the projection to the closed s-ball associated with 
the dual norm | • |^. Again, as can be seen in analogy to Proposition [lj the 
generalized gradient projection method for (|5.ip is given by the iteration 

u n+1 = S Sna {u n -s n K*(Ku n -f)) , {S Sna (w)) k = (I-P{\.\,< Snak })(wk) 

(5.2) 

where {s n } satisfies a suitable step-size rule, e.g. according to fll .3 j) or (|4.5p . 

Let us examine this method with respect to convergence. First, fix 
a minimizer u* which satisfies the optimality condition w* = —K*{Ku* — 
f) E d<&(u*). As one knows from convex analysis, this can also be formu- 
lated pointwise, and Asplund's characterization of d\ ■ \ (see [31], Proposition 
II. 8.6) leads to 

1*41* < «fc if u l = 

\ w k\* = a k an d w t' u k = a fc|^fc| if u* k ^ 

where w* k ■ u* k denotes the usual inner product of w* k and u* k in R N . Now, 
one can proceed in complete analogy to the proof of Lemma in order to 
get an estimate of the associated Bregman distance: One constructs I = 
{fc£N : \wt\ = a k } as well as the closed subspace U = {v G (£ 2 ) N : 
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v k = if k £ 1} for which U 1 - is finite-dimensional. Furthermore, we have 
p = sup fc ^/ \ w k\*/ a k < 1 an d, by equivalence of norms in R , one gets 
Co, Co > such that co|x| 2 < |x| < Co|x| 2 (with \x\ 2 = x ■ x) for all x 6 R N . 
Then, for a given MgR, whenever (F + ) < M, 

R ^ * nj + ur (E l°* ~ »*la) V2 = ^(M^Wuiv - u*)\\ 2 , 
(M + 1)C V^ / 

establishing an analogon of (|4.1|) . If moreover satisfies the FBI property, 
then one also gets an analogon to (|4.2jl . i.e. 

fl(w)+T(«) > c 2 (M, u*, i^)||f -n*|| 2 

whenever (F + $)(f) < M, by arguing analogously to Lemma [3j 

Since these two inequalities are the essential ingredient for proving 
convergence as well as the linear rate, cf. Lemma [Hand Theorem dj it holds: 

Theorem 4. The iterative soft-thresholding procedure (I5.2p for the mini- 
mization problem (15 . If) converges to a minimizer in the strong sense in {i 2 ) N 
if the step-size rule (|1.3|) is satisfied. 

Furthermore, the convergence will be at linear rate if K possesses the 
FBI property and the step-size rule (|4.5|) as well as < s < s n is satisfied. 
In particular, this is the case when < s< s n <s< 2/\\K\\ 2 . 



5.2 Accelerated gradient projection methods 

An alternative approach to implement sparsity constraints for linear inverse 
problems is based on minimizing the discrepancy within a weighted £ 1 -ball 
[8]. With the notation used in Section [U the problem can be generally 
formulated as 

min W Ku ~fW 2 Q = \ue£ 2 : Ya k \u k \<l\ . (5.3) 

k=l 

For this classical situation of constrained minimization, one finds that the 
generalized gradient projection method and the gradient projection method 
coincide (for F(u) = ^\\Ku — f\\ 2 and <I> = Iq), see Section [5J and yield the 
iteration proposed in [8]. Consequently, classical convergence results hold 
for a variety of step-size rules [11], including the 'Condition (B)' introduced 
in [8], see also Remark [71 

Let us note that linear convergence results can be obtained with the 
same techniques which have been used to prove Theorem [TJ First, consider 
the Bregman distance R associated with <1> = Iq in a minimizer u* G f2. 
With w* = —K*{Ku* — /), the optimality condition reads as 

(w*,v-u*}<0 for all v GO 44> \\a~ 1 w*\\ oc = (w* , u*) 



20 



Kristimi Brcdics and Dirk A. Lorcnz 



where (a 1 w*) k = a k w k which is in £°° since a k > a > 0. Introduce 
I = {k : 

a k lw V\ = \\ a lw *\\oo} which has to be finite since otherwise 
w* ^ £ 2 , see the proof of Lemma[3l Suppose that w* 7^ (which corresponds 
to Ku* ^ /), so sup fc ^j \a k 1 ^|/||a~ 1 u;*|| 00 = p < 1. Moreover, 

^Ja&|itj£| = 1 and sgn(u^) = sgn(w;£) for all fc with u* k ^ , (5.4) 
fee/ 

since X^fcg/ a fel u fcl < 1 leads to the contradiction 

oo 

1 1 q; 1 Tt?* 1 1 = (w* , u*) = a k 1 w k a k u k 

k=l 

- ^2 \ a k lw k\\ a kU*k\ + W a ~ lw *\\°° a k\ u k\ 
k(£I kel 

- {p^Z a k\ u k\ +'^2 a k\u*k\)\\ a ~ lw *\\oo < ||a _1 W*||oo 

while sgn(u£) ^ sgn(w^) for some k with u* k ^ implies the contradiction 

oo oo 

||a _1 ^*||oo = y~] a k ~ 1 w k 'af e u k ' < ^ wl\\a k u* k \ < ||a _1 w*||oo • 

k=l k=l 

Moreover, Ylkei a k\ u *k\ = 1 a ^ so yields u k = for all k ^ I. Further- 
more, observe that the equation for the signs in (|5.4p gives Ylkel w k u k = 
!!« w*||oo- For v ^ Q we have R(v) = oo, so estimate the Bregman distance 
for v £ $7 as follows: 

= -(w*, v - u*) = ^a^wlakiul - v k ) - ^ a k l wla k v k 

kei k0 

> ||ck _1 w* Hoo - \\a~ 1 w*\\ 00 J ^2a k \vk\ - p\\a~ 1 w*\\ 00 '^2a k \v k \ 

kei k0 

>(l-p)J2 a k \v k \ > (1 - p)a\\Puv\\i 

where U = {u £ £ 2 : u k = for k 6 /}. Using that ||u|| < ||v||i as well as 
a\\v || < a||f ||i < 1 for all v & Q finally gives, together with Pjju* = 0, 

R(v) > (1 - p)a 2 \\Pu(v - u*)f for all v £ £ 2 . 

If K possesses the FBI property, one can, analogously to the argumentation 
presented in the proof of Lemma El estimate the Bregman- Taylor distance 
such that, for some c(u*, K) > 0, 

R(v) + T(v) > c(u*,K) \\v-u* || 2 for all f £ I 2 . 
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By Theorem [2j the gradient projection method for (|5.3p converges linearly. 
This remains true for each 'accelerated' step-size choice according to 'Con- 
dition (B)' in [8], see Remark UJ This result can be summarized in the 
following theorem. 

Theorem 5. Assume that K : £ 2 — > TI2 satisfies the FBI property, ctk > 
a> and f eH 2 \ K{U) where K(Q) = {Ku : \\au\\i < 1}. 

Then, the gradient projection method for the minimization problem 
(|5.3p converges linearly, whenever the step-sizes rule (|4.5p as well as < s < 
s n is fulfilled. This is in particular the case for < s< s n < s < 2/\\K\\ 2 . 



6. Conclusions 

We conclude this article with a few remarks on the implications of our 
results. We showed that, in many cases, iterative soft-thresholding algo- 
rithms converge with linear rate and moreover that there are situations in 
which the constants can be calculated explicitly, see Theorem [3l In gen- 
eral, however, the factor A, which determines the speed within the class of 
linearly-convergent algorithms, always depends on the operator K but in 
the considered cases also on the initial value u° and a solution u*. Unfortu- 
nately, the dependence on a solution can cause A to be arbitrarily close to 
1, meaning that the iterative soft-thresholding converges arbitrarily slow in 
some sense, which is also often observed in practice. 

One key ingredient for proving the convergence result is the FBI prop- 
erty. This property also plays a role in the performance analysis of Newton 
methods applied to minimization problems with sparsity constraints [21] 
and error estimates for ^-regularization [24]. As we have moreover seen, 
linear convergence can also be obtained whenever we have convergence a 
solution with strict sparsity pattern. This result is closely connected with 
the fact that (jl.lj) . considered on a fixed sign pattern, is a quadratic prob- 
lem, and hence the iteration becomes linear from some index on. The latter 
observation is also basis of a couple of different algorithms [12, 16, 27]. 

At last we want to remark that Theorem [2] on linear convergence of the 
generalized gradient projection method holds in general and has been applied 
in a special case in order to prove Theorem [U This generality also allowed 
for a unified treatment of the similar algorithms presented in Section as 
well as other penalty terms such as powers of certain 2-convex norms, see 
Remark [SJ In all of these situations, linear convergence follows from descent 
properties on the one hand and Bregman (-Taylor) estimates on the other 
hand. 
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A. Proof of Theorem [T] (continued) 

For the second case, let u* possess a strict sparsity pattern. Define, anal- 
ogously to the above, the subspace U = {v G £ 2 : v k = if u k / 0}. The 
desired result then is implied by the fact that there is an no such that each 
u n+1 with n > no can be written as 

u n+i = y_ Sn p u±K *KP u± )(u n - u*) + u* . 

For this purpose, we introduce the notations w n = —K*{Ku n — /), w* = 
—K*(Ku* — f) and recall the optimality condition w* G d<&(u*) which can 
be written as 

w* k G [-a k ,a k ] if u% = 

w% = «fc if ?4 > 

w% = —ctk if u* k < . 

Due to assumption that u* has a strict sparsity pattern, G ]— ak,a-k[ if 
= 0, and hence there is a p > such that 

G [-(1 - p)afc, (1 - p)atk] itu* k = 

since w k — > for /c — > oo. Also note that u n — > n* implies w n — > and 
especially pointwise convergence. 

We will treat each of the cases u k = 0, u k > and m£ < separately. 
The case u* k = 0: First, we find an index n\ such that, for n > m, 

\\u n -u*\\ < ^as , |K - w*\\ < ?-a. 
ii n - 2 — » ii n - 2 _ 

So, if k G Iq with Iq = {k : u* k = 0}, we have 

\u k \<^s n a k , \w k \ < \w* k \ + \w k - w* k \ < (1 - p)a k + ^a k 

for each n > n\. Consequently, for all of these k and n, 

\u n - s n K*(Ku n - f)\ k < s n a k 

hence the thresholding operation according to (|1.2j) gives u k +1 = for all 
n > n\ and all k G Iq. Thus, the iteration for Prju n can be expressed by 

P uu n+1 = {I- s n P l/ xK*^P c/ x)(u n - u*) + Puu* (A.l) 

for all n > n\ + 1 since P\ju n = Pjju* = 0. 

The case u* k > 0: Next, investigate all k G 1+ with J + = {fc : u* k > 0}. 
This has to be a finite set, so there is a 5 + G ]0,a[ such that u* k > S + for 
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each of such k. So, choose n + according to the requirements that for all 
n > n+ 

||u n -«*|| < 5 -± , \\ w n - w *\\< 6 -± . 
n n - 2 , n n - 2 - 

Then, remembering that w k = 

ul + s„< = u% + ul - u% + s n « - u£) + s n w* k 

> u fc - K - n fcl - s n\w k - w k \ + s n a k 

> 0+ - -7: + S n Q!fc > s„a fc 

2 2s 

and hence the iteration gives, by {w n — w*) = —K*K(u n — it*), 

u k +1 =U k + s nW k - s n a k 

= u k~ u t + s n{wl ~ W* k ) + U* k 

= {{I- Sn K*K)(u n -u*)) k + ul (A.2) 
for all n > n + and all k E J+. 

The case u£ < 0: Analogously, considering the indices G J_ with 7_ = 
{/c : uj, < 0}, one can find an n_ such that 



,n+l 



((/-s n ^*K)(n"-n*)) fe + 4 (A.3) 



also holds for all n > n_ and all k E I-. 

Choosing uq = max (m + l,n+,n_) and considering (|A.1[) (|A.3|) as 
well as remembering that Pjju n = for n > uq yields that indeed 

u n+l = (j _ Sn P u±K *KP u± )(u n - u*) + u* . (A.4) 

Eventually, we can split the iteration into the subspaces V = kei(KPjj±) 
and V^, where V 1 - is taken with respect to U^. For n > no, 

P y u" +1 = (iV - SnPyPu^KPu^iu* 1 - u*) + P v u* = P v u n 

due to the fact that V = kei(KP v ±) = rg(P u ±K*) j -. Consequently, Pyu n = 
Pyu* since there would not hold that u n — > u* otherwise. Note that 
V ± is finite dimensional, hence there is a c > such that c||i-\/±u|| 2 < 
\\KP u ±P v ±u\\ 2 = \\KP v ±u\\ 2 for all u £ £ 2 . Consequently, each of the 
self-adjoint mappings P v ± — s n P v ±K* KP V ± is a strict contraction on V^: 

sup \((P V ± - s n P v ±K*KP v ±)u, P vJ _u)\ 



\\P v± u\\=l 



SUp I ||Py±-u|| 2 — S n \\KPyJ 



\\P V A "11=1 



< max ( s||K|| 2 — 1, sup ||Py±u|| 2 — S n c||Py±ti| 

V \\P V ±U\\=1 

< max (s||lf || 2 - 1, 1 - sc) = A < 1 . 
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Using that u n — u* = P v ±(u n — u*) for n > uq gives, plugged into (|A.4|) . 
u n+i _ u * = (p v± _ Sn P v± K*KP v ^)(u n - u*) 

so 

|| u n+l _ u *||2 = \\p v± ( u n+l _ „*)||2 < A 2 1 1 _P^^ " ^*)l| 2 = ^IK ~ , 

meaning \\u n — u*\\ < A n ~ n °||ii n ° — u*\\ for n > no- Finally, it is easy to find 
a C > such that ||it n -u* \ < CX n for all n. 



B. Proof of Theorem [3] 

Proof of Theorem [3]. Note that cr^ > because of the FBI property 
and that /x^ — > as k — > oo since F is compact (otherwise there would 
be a bounded sequence which converges weakly to zero with images not 
converging in the strong sense). 

Our aim is to compute a constant c\ > such that ci||Pfc(t> — it*)|| 2 < 
R(v) on a suitable bounded set and for a suitable k. Here, Pk denotes the 
orthogonal projection onto the subspace {u S I 2 : u; = for I < k}. We 
can assume without loss of generality that / ^ and thus estimate the norm 
of Ku* - f: 

^ KU * _/ " 2 < (F + $)(«*) < (F + $)(0) = ^ ||Jfu* - /|| < ll/H 



2 ~ v / - v /v / 2 

Because the index fco is chosen such that Uk < tt 2 /(4||/|| 2 ) we can estimate 
||P fc() F*(F U * - /)|| = sup (Ku* - f, KP ko v) 

\\v\\<\ 

< sup \\Ku* - f\\\\KP ko v\\ </4fl|/l| <a/2 
[|t)|| <i 

and consequently, w* = —K*(Ku* — f) satisfies \ w*.\ < a/2 for each k > k^. 
Recall from the proof of Lemma [3] that this in particular means that u k = 0, 
so one obtains the estimate 

R(v) > ^2 a k(\ v k\ ~ + w*k v k >ja^ \vk ~ u*k\ > \a\\P ko (v - u*)\\ . 

k>ko k>ko 

We assumed that the first iterate is u = 0, so (F + §)(u n ) < ^f- . Conse- 
quently, ||P fco v|| _1 > 2a||/||~ 2 whenever $(v) < (F + 3>)(0), so 

^)>« 2 ii/ir 2 iiF fc0 ( W -u*)n 2 . 

An estimate for the Taylor-distance T is found with the help of ak '■ 

= \\K(v- U *)f ||FP fc >-n*)|| 2 _ \\Kf\\P ko {v-u*)f 

V ) 1 — A O 
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> ^oin> - ^)n 2 + ii^c* - ^)ii 2 ) - jn -" A ^> • 



where = I — P ko . Rearranging terms gives: 

*ko,i .,12 <, (, K ) + 2||^|| 2 )||/|| 2 

— - \\v — u < max 1, 

4 V 4a 2 

leading to the desired constant c in Proposition [2 



|2\|| f\\2 



\v — it < max , ; ^ )r r , 

W a ko a z 



namely c = max (-^-, ifMltHl^lj^lM!! V Estimating A according to (13. 101) 
with constant step-size s = \\K\\~ 2 and (5 = 1 — s||i^|| 2 /2 = 1/2 yields 



A 2 < 1 



1 



A + 2c\\K\\ 2 



,k (1 ^rnr.l a * - 



4a fco + 8||^|| 2 ' 4a fco a 2 + 2(a ko + 2|| J FC|| 2 )||iT| 



211 f 112 



□ 



Remark B.l. The proof of Proposition [2] also establishes \\u n — u*\\ < 
(cro) 1//2 A n which implies in turn, by estimating ro < (F + $)(0) = ||/|| 2 /2 
and the maximum by the sum, the a-priori estimate 



u n - u* < 



max ( 1 



/4a 2 ||/|| 2 + K +2||K|| 2 )||/||4 



2a ko a 2 

a k 1 Vk QL 2 W 2 



4a ko + 8||K|| 2 ' 4(j fco a 2 + 2{a ko + 2\\K\\ 2 )\\K\ 
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